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Abstract. We present a new transform triple dqds to help to compute the eigenvalues of a real tridiagonal matrix 
C using real arithmetic. The algorithm uses the real dqds transform to shift by a real number and tridqds to shift 
by a complex conjugate pair. We present what seems to be a new criteria for splitting the current pair L, U. The 
algorithm rejects any transform which suffers from excessive element growth and then tries a new transform. Our 
numerical tests show that the algorithm is about 100 times faster than the Ehrlich-Aberth method of D. A. Bini, L. 
Gemignani and F. Tisseur. Our code is comparable in performance to a complex dqds code and is sometimes 3 times 
faster. 
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1. Introduction. The dqds algorithm was introduced in 1994 in as a fast and extremely 
accurate way to compute all the singular values of a bidiagonal matrix B. This algorithm implicitly 
performs the Cholesky LR iteration on the tridiagonal matrix B T B and it is used in LAPACK. 
However the dqds algorithm can also be regarded as executing, implicitly, the LR algorithm applied 
to any tridiagonal matrix with l's on the superdiagonal. Our interest is in real matrices which may 
have complex conjugate pairs of eigenvalues. It is natural to try to retain real arithmetic and yet 
permit complex shifts of origin. Our analogue of the double shift QR algorithm of J. G. F. Francis 
[12] is the triple step dqds algorithm. The purpose of this paper is to explain why 3 steps are needed 
to derive the algorithm, to explain how we reject transforms with unacceptable element growth and 
to compare performance with some rival methods. Our conclusion is that this procedure is clearly 
the fastest method available at the present time. 

We say nothing about the need for a tridiagonal eigensolver because this issue is admirably 
covered in Bini, Gemignani and Tisseur pQ. In fact many parts of [1] have been of great help to us. 
We also acknowledge the preliminary work on this problem by Z. Wu in [23 . 

We do not follow Householder conventions except that we reserve capital Roman letters for 
matrices. Section [5] describes other methods, Section [3] presents standard, but needed, material 
on LR, dqds, double shifts and the implicit L theorem. Section [4] develops our tridqds algorithm, 
Section [5] is our error analysis, Section [6] our splitting, deflation and shift strategy, and Section [7] 
presents our numerical tests using Matlab. Finally, Section [8] gives our conclusions and also our 
ideas about why tridqds is only one (important) ingredient for a procedure that must also provide 
condition numbers and eigenvectors. 

2. Other methods. 

2.1. 2 steps of LR = 1 step of QR. A frequent exercise for students is to show that for 
a symmetric positive definite tridiagonal matrix 2 steps of the LR (Cholesky) algorithm produces 
the same matrix as 1 step of the QR algorithm. Less well known is the article by H. Xu [24] which 
extends this result when the symmetric matrix is not positive definite. The catch here is that the LR 



* The first author is supported by FEDER Funds through "Programa Operacional Factores de Competitividade - 
COMPETE" and by Portuguese Funds through FCT - "Fundacao para a Ciencia e a Tecnologia", within the Project 
PEst-CMAT /UI0013/201 1 . 

^Mathematics and Applications Department, University of Minho, 4710-057 Braga (caferrei@math.uminho.pt) 
+ Department of Mathematics and the Computer Science Division of the Electrical Engineering and Computer 
Science Department, University of California, Berkeley, California 94720 (parlettamath.berkeley.edu) 



1 



2 



C. Ferreira and B. Parlett 



transform, if it exists, does not preserve symmetry. The remedy is to regard similarities by diagonal 
matrices as "trivial" , always available, operations. Indeed, diagonal similarities cannot introduce 
zeros into a matrix. So, when successful, 2 steps os LR are diagonally similar to one step of QR. 
Even less well known is a short paper by J. Slemons |20) showing that for a tridiagonal matrix, not 
necessarily symmetric, 2 steps of of LR are diagonally equivalent to 1 step of HR, see [2]. Note 
that when symmetry disappears then QR is out of the running because it does not preserve the 
tridiagonal property. 

The point of listing these results is to emphasize that 2 steps of LR gives twice as many shift 
opportunities as 1 step of QR or HR. Thus convergence can be more rapid with LR (or dqds) than 
with QR or HR. This is one of the reasons that dqds is faster than QR for computing singular values 
of bidiagonals. This extra speed is an additional bonus to the fundamental advantage that dqds 
delivers high relative accuracy in all the singular values. The one drawback to dqds, for bidiagonals, 
is that the singular values must be computed in monotone increasing order; QR allows the singular 
values to be found in any order. 

In our case, failure is always possible and so there is no constraint on the order in which 
eigenvalues are found. The feature of having more opportunities to shift leads us to favor dqds over 
QR and HR. See the list of other methods which follows. We take up the methods in historical order 
and consider only those that preserve tridiagonal form. 

2.2. Cullum's complex QR algorithm. As part of a program that used the Lanczos al- 
gorithm to reduce a given matrix to tridiagonal form in [4 , Jane Cullum used the fact that an 
unsymmetric tridiagonal matrix may always be balanced by a diagonal similarity transformation. 
She then observed that another diagonal similarity with 1 or i produces a symmetric, but complex, 
tridiagonal matrix to which the (complex) tridiagonal QR algorithm may be applied. The process 
is not backward stable because the relation 

cos 2 t + sin 2 t = 1 

is not constraint on | cosr| and | sinr| when they are not real. Despite the possibility of breakdown 
the method proved satisfactory in practice. We have not used it in our comparisons because we are 
persuaded by 12.11 that it is out performed by the complex dqds algorithm, described below. 

2.3. Liu's HR algorithm. In [13 Alex Liu found a variation on the HR algorithm of Angelika 
Bunse-Gerstner that, in exact arithmetic, is guaranteed not to breakdown - but the price is a 
temporary increase in bandwith. This procedure has only been implemented in Maple and we do 
not include it in our comparison. 

2.4. Complex dqds. In his thesis David Day [5j developed a Lanczos-style algorithm to reduce 
a general matrix to tridiagonal form and, as with Jane Cullum, needed a suitable algorithm to 
compute its eigenvalues. He knew of the effectiveness of dqds in the symmetric positive definite 
case and realized that dqds extends formally to any tridiagonal that admits triangular factorization. 
Without positivity the attractive property of achieving high relative accuracy disappears but, despite 
possible element growth, the error analysis for dqds persists: if the transform does not breakdown 
then tiny well chosen changes in the entries of input L, U (giving L, U) and output L, U (giving 
L, U) produces an exact relation 

LU = LU- al 

with the given shift a. See Section [57X1 The code uses complex arithmetic because of the possible 
presence of complex conjugate pairs of eigenvalues. We have wrapped David Day's complex dqds 
code in a more sophisticated wrapper that chooses suitable shifts after rejecting a transform for 
excessive element growth. 
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2.5. Ehrlich-Aberth algorithm. This very careful and accurate procedure was presented by 
Bini, Gemignani and Tisseur in pQ. It finds the zeros of the characteristic polynomial p(-) and 
exploits the tridiagonal form to evaluate p'(z)/p(z) for any z. The polynomial solver improves a full 
set of approximate zeros at each step. Initial approximations are found using a divide-and-conquer 
procedure that delivers the eigenvalues of the top and bottom halves of the matrix T. The quantity 
p'(z)/p(z) is evaluated indirectly as [trace(z/ — T) _1 ] using a QR factorization of zI — T. Since T is 
not altered there is no deflation to assist efficiency Very careful tests exhibit the method's accuracy 
- but it is very slow compared to both dqds-type algorithms. 

3. LR and dqds. The reader is expected to have had some exposure to the QR and/or LR 
algorithms so we will be brief. 

3.1. LU factorization. Any n x n matrix A permits unique triangular factorization A = LDU 
where L is unit lower triangular, D is diagonal, U is unit upper triangular, if and only if the leading 
principal submatrices of orders 1, . . . , n — 1 are nonsingular. 

In this paper we follow common practice and write U = DU so that the "pivots" (entries of D) 
lie on t/'s diagonal. Throughout this paper any matrix L is unit lower triangular and U is upper 
triangular. 

3.2. LR transform with shift. Note that U is "right" triangular and L is "left" triangular 
and this explains the standard name LR. For any shift a let 

A-al = LU, (3.1) 
A = UL + aI. (3.2) 

Then A is the LR(cr) transform of A. Note that 

A = L~ l {A - aI)L + al = L^AL. 

We say that the shift is restored (in contrast to dqds - see below). The LR algorithm consists of 
repeated LR transforms with shifts chosen to enhance convergence to upper triangular form. For 
the theory see QH QH HU [22] . 

In contrast to the well known QR algorithm, the LR algorithm can breakdown and can suffer 
from element growth, ||L|| >> ||A||, \\U\\ >> \\A\\. However LR preserves the banded form of A 
while QR does not (except for the Hessenberg form). 

When a matrix A is represented by its entries then the shift operation A — > A — al is trivial. 
When a matrix is given in factored form the shift operation is not trivial. 

3.3. The dqds algorithm. From now on we focus on tridiagonal matrices in J- form - entries 
(i, i + 1) are all 1, i = 1, . . . , n — 1. Throughout this paper all J matrices have this form. 

If J — al permits triangular factorization 

J -al = LU 



then L and U must have the following form 



1 

h 1 



ln-2 1 



u ■■ 



1*1 1 

"2 



(3.3) 



Un-i 1 
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It is an attractive feature of LR that 

UL = J 

is also of J-form. Thus the parameters U, i = 1, ... ,n — 1, and uj, j — 1, . . . , n, determine the 
matrices L and U above and implicitly define two tridiagonal matrices LU and UL. 

The qds algorithm is equivalent to the LR algorithm but no tridiagonal matrices are ever formed. 
The progressive transformation is from L, U to L, U, 

LU = UL- al. (3.4) 

Notice that the shift is not restored and so UL is not similar to UL. 
Equating entries in each side of equation p.4[) gives 

qds(a) : Ui = U\ + l\ — <r; 

for i = 1, . . . , n — 1 

k = liU l+ l/Ui 

Ui+l = Ui+l + Zj+l - <7 - Ij 

end for. 

The algorithm qds fails when Ui = for some i < n. When a — we write simply qd, not qds. 

In 1994 a better way was found to implement qds(tr) that had been used by Rutishauser as early 
as 1955. These are called differential qd algorithms. See [TS] for more history. This form uses uses 
an extra variable d but has compensating advantages. 

dqds((j) : d\ = u\ — a 

for i = 1, . . . , n — 1 

Ui = di + k 
U = k{u i+ i/ui) 
di+i = di(u i+1 /ui) - a 
end for 

d n . 

By definition, dqd=dqds(0). 

A word on terminology. In Rutishauser's original work qi = Ui, = U; the q^s were certain 
quotients and and the e^'s were called modified differences. In fact the qd algorithm led to the LR 
algorithm, not vice- versa. The reader can find more information concerning dqds in |15i 116] 

One virtue of the dqds and QR transforms is that they work on the whole matrix so that large 
eigenvalues are converging near the top, albeit slowly, while the small ones are being picked off at 
the bottom. 

We summarize some advantages and disadvantages of the factored form. 

Advantages of the factored form 

1. L, U determines the entries of J to greater than working-precision accuracy because the 
addition and multiplication of Vs and it's is implicit. Thus, for instance, the (i, i) entry of 
J is given by + Ui implicitly but fl(li—\ + m) explicitly. 

2. Singularity of J is detectable by inspection when L and U are given, but only by calculation 
from J. So, LU reveals singularity, J does not. 

3. LU defines the eigenvalues better than J does (usually). There is more on this in [5]. 
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4. Solution of Jx = b takes half the time when L and U are available. 

Disadvantages of the factored form 

The mapping J, a t-» L, U is not everywhere defined for all pairs J, er and can suffer from element 
growth. This defect is not as serious as it was when the new transforms were written over the old 
ones. For tridiagonals we can afford to double the storage and map L, U into different arrays L, U . 
Then we can decide whether or not to accept L, U and only then would L and U be overwritten. So 
the difficulty of excessive element growth has been changed from disaster to the non-trivial but less 
intimidating one of, after rejecting a transform, choosing a new shift that will not spoil convergence 
and will not cause another rejection. 

Now we turn to our main question of clqds(a): how can complex shifts be used without having 
to use complex arithmetic? This question has a beautiful answer for QR and LR iterations. 

3.4. Double shift LR algorithm. We use the J, L and U notation from the previous section. 
Consider two steps of the LR algorithm with shifts <j\ and ct 2 , 



Then 



with 



and 



o\l = L 2 U 2 

<h = U2L2 + o\I 
(T2I = 

J4 = U3L3 + o\l. 



J 4 ^C' 1 J 2 C (3.5) 



L 2 L 3l U = U 3 U 2 



CM = L 2 (J 3 - <r 2 I)U 2 

= L 2 (U 2 L2 + (T\I)U-2 - CJ2L2U2 
= L2V2 [L 2 U 2 + Oi - CT 2 )J] 

= Jl - (<7i + 02) J 2 + 0-102/ =: M (3.6) 

Suppose that J2 is real and o\ is complex. Then J4 will be real if, and only if, 02 = a. The reason 
is that M is real, so that C and U are real and, by (|3.5I) . J4 is the product of real matrices. Note 
however that L 2 , U2, L3, U3 are all complex. Fortunately it is possible to compute J4 from J2 without 
using J 3 . This depends on the following result. 

Theorem 3.1. [Implicit L theorem] If Hi and H 2 are unreduced upper Hessenberg matrices 
and H 2 = L^ 1 HiL, where L is unit lower triangular, then H2 and L are completely determined by 
Hi and column 1 of L, Le\ . We omit the proof. 

The clever application to J2 and J4 is to observe that column 1 of M, 



() 
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is proportional to column 1 of L and has only three nonzero entries below the diagonal because J 2 
is tridiagonal. Now choose 



£1 = I + mei J 



where 



m = [0 m2i/?nii TO3i/mn ... 0] 
and perform an explicit similarity transform on J 2 , 

£^J 2 £i =: K. 
Observe that K is not tridiagonal. In the 6x6 case 



K 



1 

x 
x 



Next we apply a sequence of elementary similarity transformations such that each transformation 
pushes the 2x1 bulge one row down and one column to the right. Finally the bulge is chased off 
the bottom to restore the J-form. In exact arithmetic, the implicit L theorem ensures that this 
technique of bulge chasing gives 

J 4 = (£1 • • • £„-i) _1 J 2 (£i . . .C n -i) and £ = £!...£„_!. 

4. Triple dqds algorithm. 

4.1. Connection to LR algorithm. In figure I4TT1 we examine the double shift LR transform 
derived in section T3.4I but with a significant difference. Instead of J 2 being an arbitrary real matrix 
in J-form, we assume that it is given to us in the form UiL\ obtained from one step of the LR 
algorithm with shift from real J\. 



Ji 



LR(0) 



LR(ct) 



Js 



LR(ct) 



Ja 







Li,Ui 



L 2 ,U 2 L 3 ,U 3 J* L 4 ,C/ 4 

dqds(er) dqds(cr — a) dqds(— a) 



Figure 4.1. Double shift LR and three steps 0} dqds 

The crucial observation is that, along the bottom line L 2l U2, £3, U3 are all complex and so it 
requires 3 dqds steps to go from real L\,Ui to real L4, U4. Moreover the non- restoring shifts in dqds 
are 



a — 0, a — u. — <t. 
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Here is another way of seeing the relation between LR and dqds: 

I Ji= LiUj. 
\ J 2 = UxL x 

f J 2 -aI= L 2 U 2 L 2 U 2 ^U 1 L 1 -aI 
\ J 3 = U 2 L 2 + a I 

( J 3 -aI= L 3 U 3 L 3 U 3 = (U 2 L 2 + a I) - al 
\ J 4 = U 3 L 3 + aI 

| J 4 = L 4 U 4 L 4 U 4 = {U 3 L 3 + WI) - 01 

Recall from the previous section that the double LR algorithm can work with complex shifts in real 
arithmetic by bulge chasing. The rest of this section developes a form of bulge chasing for the dqds 
algorithm. 

4.2. 3 steps of dqds. In contrast to a single dqds step our tripJe dqds restores the shifts. 
Recall from Q3.5P in section [5T4l that 

LiUi^ J 4 = CT 1 J 2 C = CT^UxLxL (4.1) 

and, since o\ — a and er 2 = a, matrix M in (I3.6[) is given by 

M = (f/iii) 2 - 2{lka l )U l L l + |o-i| 2 /. (4.2) 

The idea is to transform U\ into L 4 and L\ into U4 by bulge chasing in each matrix, 

Notice that we need to transform an upper bidiagonal into a lower bidiagonal and vice-versa. From 
the uniqueness of the LU factorization, when it exists, it follows that there is a unique hidden matrix 
X such that 

Li = cr x u x x- x , XL X C = Ui. 

For more on X see [13] . The matrix C is given, from section 13.41 as a product 

C = A--- Hn-xCn 

(C n = I) and we will gradually construct the matrix X in corresponding factored form X n , . . . , X 2 X\ . 
In fact we will write each Xi as a product 

Xi = YiZ{. 



The details are quite complicated. 
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4.2.1. Chasing the bulges. Starting with the factors L l5 U\ and the shift cr, we normalize 
column 1 of M in (|4.2[) to form C\ , spoil the bidiagonal form with 

L^UxLxCi 

and at each minor step i, i = 1, . . . , n, matrices Zj, £i and Y t are chosen to chase the bulges. After 
n minor steps, we obtain L4 and t/4, 

L4U4 — C n 1 • • • C 1 1 U\Z 1 1 Y 1 1 • • • Z n Y n 1 Y n Z n ■ ■ ■ Y\Z\L\L\ ■ ■ ■ C n 



: £-n 1 ' ' ' ^1 1 ' ' ' -^n 1 X n ■ ■ ■ X\L\L\ ■ ■ ■ C r . 



= cr 1 UiX- 1 XL 1 c 

Conceptually we create two work arrays F and G. Initially, 

F = Ui, G = L l 

and, finally, 

F = U, G = U 4 . 

For a complex shift cr, the triple dqds algorithm has the following matrix formulation: 
tridqds(cr, a) : 

F = U 1 ; G = L 1 

F = FZ1 1 ; G = Z X G 

F = £~ 1 F; G = Gd [form d using (42}] 

F = FYf 1 ; G = YiG 

for i = 2, . . . , n — 3 

F = FZr 1 - G = ZiG 
F = C^F; G = Gd 

F = FY^ 1 : G = YiG [Zi with one, d with two and Yi with three 

nonzero off-diagonal entries] 



[y„_2 with two nonzero off-diagonal entries] 



end for 




% step n-2 




F = FZ n-2i G = 




F = C-^F; G = 


- GC n -2 


F = FY~} 2 ; G = 


- Y n -iG 


% step n-1 




F = FZ~_ X \ G = 


- Z-n-iG 


F = £-\F; G = 


- GC n -i 


F = FY-_\; G = 


- Y n -\G 


% step n 




C — J- Y — J 




F = FZ- 1 ; G = 


Z n G 


L4 = F; F4 = G 





[Y n _i and with one nonzero off-diagonal entry] 



[Z n diagonal] 
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4.2.2. Details of tridqds. In this section we will go into the details of the tridqds algo- 
rithm described in the previous section. Consider L\ with subdiagonal entries li, . . . , l n -\ and U\ 
with diagonal entries u\, . . . ,u„, as defined in Section 13.31 and consider matrices L4 and C/4 with 
subdiagonal entries l±, . . . , l n —i and diagonal entries ux, . . . , u n , respectively. 

For each iteration of tridqds, at the beginning of a minor step i, i = 2, ...,n — 2, the active 
4x4 windows of F and G are 



F = 



1 

k-l Ui 1 

+ u i+ i 1 

+ U l+2 



G = 



Ui-l 1 



+ h+l 



(4.3) 



Each minor step i, i — 2, ... ,n — 3, consists of the following 3 parts. 
Minor step i 

a) F < — FZ^ 1 puts into and 1 into F^i 

G i — ZiG turns Gi 4+1 into 1 



Z7 1 = 



Zi = 



Hi 





FZr 



li-i 1 







1 



1 

* 1 

+ 1 

+ h+i 



b) F < — C i F puts in F l+M _i and F i+2 ,i-i 

G -s — GCi defines ui and creates 3 nonzeros below it 
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£7 l 



1 

* 1 

* 1 



I + xei 



Ci I X€.,- L , 



C7 1 F = 



1 

h-i 



H+l 



1 

U i+ 2 



Gd = 



1 

k+2 



c) G i — YiG puts in Gt+i.i, Gi+2,i and G l+ 3^ 

F i — FY^ 1 creates Zj and puts 2 nonzeros below it 



X-- 1 = 



1 

* 1 

* 1 



= I + yei, Yi = I- yej 



fy; 



1 

h u l+ i 1 

+ u l+2 1 

+ Ui+3 



YG 



Ui 1 



+ h+2 1 



The result of this minor step is that the active windows of F and G shown in (I4.3[) have been 
moved down and to the right by one place. See Appendix [X] for more details on the practical 
implementation. 

Naturally steps l,n — 2,n — 1, n are slightly different and may be found on pp. 147-157 of [5]. 
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4.3. Operation count for tridqds. In this section we will see how three steps of simple 
dqds algorithm compares with one step of tridqds in what respects to the number of floating point 
operations required. 

Here is the inner loop of tridqds. See Appendix iBl 

tridqds(a, a) : 

for i = 2, . . . , n — 3 

xi = -xi * (1/li-x); Vi = -yi * 0-/h-i); 

Hi X r — X\ , 

X r =Vr ~ Xf, y r = Z r - J/J - Xj * k+f, Z r = -t/j * l i+2 

x r = x r *(l/ui); y r =y r * (1/ui); z r = z r *(l/iii) 

k = xi + y r + x r * m+i 

xi = yi + z r + y r * u l+2 ; yi = z r * u i+3 

X r 1 X r , y r yr: z r z r 

end for 

A good compiler recognizes common subexpressions. 
In contrast, 

dqds((j) : di — u\ — a 

for i = 1 , . . . , n — 1 

Hi = di + k 
k = h(ui+i/ui) 
di+i = di(u i+ i/ui) - a 
end for 

dfi . 

In practice, each di+\ may be written over its predecessor in a single variable d and, if the common 
subexpression ttj+i/uj is recognized, then only one division is needed if we use an auxiliary variable. 

Table 14.11 below shows that the operation count of one step of tridqds is comparable to three 
steps of dqds (table expresses only the number of floating point operations in the inner loops). 



tridqds 3 dqds steps 



Divisions 


2 


3 


Multiplications 


11 


6 


Additions 


5 


3 


Subtractions 


6 


3 


Assignments 


16 


12 


Auxiliary variables 


5 


2 



Table 4.1 

Operation count of tridqds and 3 dqds steps 



But to make three steps of dqds equivalent to tridqds we have to consider dqds in complex 
arithmetic and the total cost is raised by a factor of about 4. Thus, in complex arithmetic, three 
steps of dqds are much more expensive than one step of tridqds. 
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5. Error analysis. We turn to the effect of finite precision arithmetic on our algorithms. First 
consider the dqds algorithm. 

5.1. dqds. In the absence of over/underflow the algorithm enjoys the so-called mixed relative 
stability property. 

Theorem 5.1. Let dqds(cr) map L, U into computed L, U with no division by zero, over/underflow. 
Then well chosen small relative changes in the entries of both input and output matrices, of at most 3 
ulps each, produces new matrices, one pair mapped into the other, in exact arithmetic, by dqds(a). 

See the diagram in Figure 15.11 The remarkable feature here is that huge element growth does 
not impair the result. However this useful property does not guarantee that dqds returns accurate 
eigenvalues. See [7J [15] ■ For that, an extra requirement is needed such as positivity of all the 
parameters Uj, lj in the computation. This is the case for the eigenvalues of B T B where B is upper 
bidiagonal. 

What can be said in our case? We quote a result that is established by Yao Yang in his 
dissertation [25] and appears in [15] . The clever idea is not to look at the L and U separately but 
to study their exact product J = LU. 



LU dqds_^ l 

computed 



change each 



each 



h b y i uip I f e ha T\ 

u k by 3 ulps i \ lh,u k by 2 ulps 

L,U d -^~^ L,U 



LU = UL- o-l 



Figure 5.1. Effects of roundoff for dqds 

Theorem 5.2. [Y. Yang] If dqds(a) maps L,U into L,U (with no division by 0, over- 
flow/underflow) in the standard model of floating point arithmetic then there is a unique pair L, U 
such that, in exact arithmetic, dqds(a) maps L,U into L,U. Moreover, the associated tridiagonals 
satisfy, element by element, 

| offdiag(J) - offdiag(J)| < 2s\ offdiag(J)| 

| diag(J) - diag(J)| < e fo\u\ + + |f| + |u| + 2|d|) 

where e is the roundoff unit. 

This result is Corollary 3 in Section 9 of [T5J. It shows that it is only the diagonal of J that 
suffers large backward error in the case of element growth. Since u = d + I the last inequality may 
be written as 

|diag(J)-diag(J)| <e(2|u| + |I| + M|l| + |f|+3|d|) . 

Recall that d^ 1 = [(C/L) -1 ] .. , i — 1, . . . , n. Thus the indices vulnerable to large backward error 
belong to any very small entries [([/ i)^ 1 ] u - For this reason we reject L, U when, element by element, 

H|l| + |Z|+3|d| > 1000(|u| + (5.1) 

Recall that the error analysis is worst case. Recall also that the effect of a tiny Uk disappears for 
i > k+ 1. 
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5.2. tridqds. There are too many intermediate variables in this algorithm to permit a suc- 
cessful mixed error analysis. However each minor step in the algorithm consists of 3 elementary 
similarity transformations on work matrices F,G or G,F. See ... in Section [4.2.11 Recall that an 



elementary matrix here is of the form / + vej , with inverse / - 1 



vej , and v has at most 3 nonzero 
entries. So we examine the condition number of these 3 similarity transforms. Consult Appendix 1X1 
to follow the details. 

• The active part of Zi is 



The active part of Ci is 



and cond(Zi) ~ max{|? 



1 

-Xl/U-i 1 

—yi/h-i 

• The active part of Y% is 



1 

~ Xf / tiii 1 

-y r /ui 1 

-z r /ui 1 



and cond(A) ~ 1 + ^- + U^- 



h-1 



m 

h-1 



and cond(li) 



1 + I ^ 

Ui 



The variables xi,yi,x r ,y r , z r are formed from additions and multiplications of previous quantities. 
Note that it; is part of the input and so is assumed to be of acceptable size. We see that it is 
tiny values of and iii that lead to an ill-conditioned similarity at minor step i. In the simple 
dqds algorithm a small value of ttj (relative to leads to a large value of U+i and dj+i. In 

tridqds the effect of 3 consecutive transforms is more complicated. The message is the same: reject 
any transform that has more then modest element growth, as determined by (|5.1j) in the previous 
section. This challenge calls for further study. 

6. Implementation details. 

6.1. Deflation (n <— n — 1). Some of out criteria for deflating come from [TB], others are 
new. Consider both matrices UL and LU and the trailing 2x2 blocks, 



ln-1 + u n-l 1 



l n -2+U n -\ 1 
'n— l^n— 1 — 1 ~f 



Deflation (n <— n — 1) removes l n -i as well as u n . Looking at entry (n — 1, n — 1) of UL shows that 
a necessary condition is that l n -i be negligible compared to u n —i, 



/„_! < tol ■ |w n _l|, 



(6.1) 



for a certain tolerance tol close to roundoff unit e. 

The (n, n) entries of UL and LU suggest either u n +Acshift or l n _x+u n +Acshift as eigenvalues. 
Acshift is the accumulated shift (recall that dqds is a non-restoring transform). To make these 
consistent we require that 



|Z„_i | < tol ■ \u n + Acshift\. 



(6.2) 
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Finally we must consider the change S A in the eigenvalue A caused by setting l n _i = 0. Wc 
estimate <5A by starting from UL with l n -i — and then allowing l n -\ to grow. To this end let J 
be UL with l n -i = and (u n ,y T ,x) be the eigentriple for J. Clearly y T = e^. Now we consider 
perturbation theory with parameter l n -i- The perturbing matrix 5 J, as l n -\ grows, is 



By first order perturbation analysis 



|5A| 



\y T sJx\ 



and \\y\\2 = 1 in our case. So, 



|<5A| 



and wc use the crude bound 



|Z„_ie£(e n _i + e„u„)e^'_ 1 a;| _ |Z n -i«n||#n-i| 



x 2 



X 2 



\x n -i\ 

\\*h 



< 1. So, wc let Z„_i grow until the change 



\SX\ < |Z„-iu„| 

in eigenvalue A = u n is no longer acceptable. Our condition for deflation is then 

\l n -\u n \ < tol ■ \Acshift + u n \. 



(6.3) 



A similar first order perturbation analysis for LU with l n -i — will give our last condition for 
deflation. For the eigentriple (u n , y T , x) we also have y T — The perturbing matrix is now 



and 



\SX\ = 



l n -ie n {e^-iUn-i + e£) 



« 2 



; , \U n -lX n -i + x n \ , ... . . 

«n-i| n — ii < 1 1 + !) • 



a: 2 



Finally we require 



\ln-l\ (\u n -l\ + 1) < toZ • |Acs/li/t + U„|. 



(6.4) 



6.2. Splitting and deflation (n <— n — 2). Recall that the implicit L theorem was invoked 
to justify the tridqds algorithm. This result fails if any Ik, k < n — 1 vanishes. Consequently, 
checking for negligible values among the Ik is a necessity, not a luxury for increased efficiency. 
Consider J = UL in block form 



Jl 






1 








J 2 
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where /x = Uk+ih, k < n — 1. We can replace fi by when 

spectrum( Ji) U spectrum( J 2 ) = spectrum(J), to working accuracy. 

However we are not going to estimate the eigenvalues of J\ and J 2 ■ Instead we create a local criterion 
for splitting at (k + 1, k) as follows. Focus on the principal 4x4 window of J given by 



u/c-i + h-i 1 






Ukh-i Uk + h 


1 




Uk+ih 


Uk+i + h+i 


1 






Uk+2 + h+2 



Now Ji and J2 are both 2x2 and our local criterion is 

det( Ji) • det(J 2 ) = det(J), to working accuracy. 



(6.5) 



Let us see what this yields. Perform block factorization on J and note that the Schur complement 
of Ji in J is 



J'2 — J2 - 



u. 




JT 1 





1 



with 



where 



Thus 



deti 



Uk + lk -1 



deti = det(Ji) = uk-i(uk + h) + h-ih- 



Uk+ih Uk+i + h+i 

Uk+2h+l Uk+2 + h+2 



ju(u fe _i + lk-\)/deti 




Since dot is linear by rows 

dct(J 2 ) - dct(J2) = ^{uk-i + /fe_i)(ufe+2 + lk+2)/det 1 . 
Our criterion reduces to splitting only when 

det(J 2 ) = det(J 2 ), to working accuracy. 

Thus we require 

\lkU k +i(u k +2 + h+2){uk-i + lk-i)/deh\ < tol ■ |det(J 2 )| . 

Since 

deti = det(J 2 ) = u k+ i(u k+ 2 + h+2) + h+ih+2, 
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the criterion for splitting J at (fc + 1, k) is then 

\l k u k+1 (u k+2 + lk+2){uk-i + h-i)\ < tol ■ \detidet 2 \ . (6.6) 
Finally, to remove l k we also need l k to be negligible compared to u k , 

\l k \ < tol ■ \u k \. (6.7) 

Deflation (n n — 2) 

We use the same criterion for deflation (n <— n — 2), but because = = there is a 
common factor def2 on each side of (I6.6[) . Deflate the trailing 2x2 submatrix when 

\l n ^ 2 \ < tol ■ \u n - 2 \ (6.8) 

and 

|Z„- 2 (u„_3 + 'n-3)| < tol ■ |u„- 3 ("n-2 + ln-2) + Z n -3^n-2-| • (6.9) 

We omit the role of Acshift here because it makes the situation more complicated. We have to 
recall that tridqds uses restoring shifts and Acshift is always real. So, for complex shifts, det 2 is 
not going to zero. In fact 

\det 2 \ > |3(A)| 2 

where A is an eigenvalue of J 2 . 

When n = 3 these criteria simplify a lot. Both reduce to 

\h\ < tol ■ \m\. 

6.3. Shift strategy. Although tridqds may be, and has been, used to compute all the eigen- 
values, it seems sensible to include real dqds(cx) so that when all eigenvalues are real tridqds need 
not be called. 

As with LR, the dqds algorithm with no shift gradually forces large entries to the top and brings 
small entries towards the bottom. Before every transform both l n -\ and l n -2 are inspected. If 

l^n-i I < and |Z n _ 2 | < ^ 

then the code executes dqds transform with the Wilkinson shift or a 3dqds transform with Francis 
shifts depending on the sign of the discriminant. 

An unexpected reward for having both transforms available is to cope with a rejected transform. 
Our strategy is simply to use the other transform with the current shift. More precisely, given a 
complex shift a, if tridqds(a, a) is rejected we try dqds(u n ); if for real r, dqds(r) is rejected, we try 
tridqds(r,f). So far, this has not failed. 

More generally, an increase in the imaginary part of the shift increases diagonal dominance. At 
the extreme, consider a pair of pure imaginary shifts ifi, —in, ^ positive. The tridqds wants UL + u, 2 I 
to permit triangular factorization. The bigger is [i the better. 

A great attraction of IEEE arithmetic standard is that it allows the symbols inf and NaN. Thus 
there is no need for time consuming with if statements in the main loop. At the end of the loops we 
test for rejection or excessive element growth. We record the number of rejections. 
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7. Numerical Examples. Those who work with well defined problems have the habit of 
determining the "true" (or most accurate) solutions and comparing computed values with them to 
give the error. The condition number of every eigenvalue of a real symmetric matrix is 1, but only in 
the absolute sense. The relative condition number can vary. In our case even the absolute condition 
numbers can rise to oo and little is known about relative errors. 

We have discovered [9] that more often than not the eigenvalues of tridiagonal, and reasonable 
well balanced, matrices are well determined by an LU or ALDL T representation (A is a signature 
matrix). This is good news but much work remains. Our main focus is on the time it takes to 
get reasonable approximations, recognizing that we do not know how well the data defines the 
eigenvalues. 

We refer to the Ehrlich-Aberth algorithm (see Section |2~5]) as BGT and to our code simply as 
tridqds, although we combine tridqds with real dqds as described in Section 16.31 

Since we compare Matlab versions of all the codes we acknowledge that the elapsed times are 
accurate to only about 0.02 seconds. However this is good enough to show the ratios between BGT 
and the dqds codes. The efficiency of complex dqds is harder to determine. Sometimes the same, 
sometimes tridqds is 3 times faster. 

Since the number of iterations needed for convergence on our (modest) test bed has remained 
about An, we have not tried for a strategy as sophisticated as the one in [IB] , 



Bessel matrix 



Bessel matrices, associated with generalized Bessel polynomials, are nonsymmetric tridiagonals 
matrices defined by B^ 1 '^ — tridiag(/3, a, 7) with 

b at 

Oil = — , 7i = Pi = — — r, 

a a + 1 



and 



« ~ 2 

—0-, r-, , 7=2, ...,n, 

(2j + o-2)(2j + o-4)' 

b J +a - 2 



(2j + a-2)(2j + a-3)' 

b j 

(2j+o-1)(2j+o-2) 



, i = 2,...,n-l. 



Parameter & is a scaling factor and most authors take b = 2 and so do we. The case a € K is the most 
investigated in literature. The eigenvalues of B^' b \ well separated complex eigenvalues, suffer from 
ill-conditioning that increases with n - close to a defective matrix. In Pasquini |17j it is mentioned 
that the ill-conditining seems to reach its maximum when a ranges from —8.5 to —4.5. 

Our examples take Bn 4 ' 5 ' 2 ^ for n = 18,20,40. We show pictures for BGT and tridqds to 
illustrate the extreme sensitivity of some of the eigenvalues. The results of complex dqds are visually 
identical to tridqds, so we don't show them. In exact arithmetic the spectrum lies on an arc in the 
interior of the moon-shaped region. See Figure 17.11 
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BGT (.) and tridqds (+) 



0.1 
0.08 
0.06 
0.04 
0.02 


-0.02 
-0.04 
-0.06 
-0.08 
-0.1 



0.1 




0.08 




0.06 






0.04 




/ + 






/ + . 


0.02 








+ 













•f 


-0.02 






-0.04 




\ + 


-0.06 




-0.08 




-0.1 





0.05 0.1 



(a) n = 18 



BGT (.) and tridqds (+) 



BGT (.) and tridqds (+) 




-0.1 -0.05 




-0.12 -0.1 -0.08 -0.06 -0.04 -0.02 



(b) n = 20 

Figure 7.1. Eigenvalues of Bessel matrix B 



(c) n = 40 

(-4.5,2) 



Clement matrix 

The so-called Clement matrices (see [3]) 

C = tridiag(6, 0, c) 

with bj = j and Cj = b n -j, j — 1, . . . ,n — 1, have real eigenvalues 

± n — 1, n — 3, . . . , 1, for n even, 
± n — 1, n — 3, . . . , 0, for n odd. 

These matrices posed no serious difficulties. The initial zero diagonal obliges the dqds based methods 
to take care when finding an initial L, U factorization. 

The tridqds code uses only real dqds transforms as it should. Our accuracy is less than BGT 
but satisfactory. The complex dqds and tridqds performed identically. The ratio of elapsed times is 
the striking feature. 
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Our numerical tests have n — 100,200,400,800. The minimum and maximum relative errors, 
rel m i n and rel max , are shown in Table 17.11 and the CPU times in Table 17.21 



BGT complex dqds tridqds 



n 














100 





3 10~ 16 





3 10" i4 





3 10- 14 


200 





4 10~ 16 





3 10~ 13 





3 10~ 13 


400 





1 10~ 15 





3 HT 12 





3 10- 12 


800 





1 10~ 15 


2 10" 16 


2 HT 12 


2 10~ 16 


2 10- 12 



Table 7.1 
Relative errors for Clement matrices 



n 


BGT 


complex dqds 


tridqds 


100 


4.2 


0.06 


0.06 


200 


12.6 


0.12 


0.13 


400 


42.2 


0.45 


0.50 


800 


174.2 


1.8 


1.8 



Table 7.2 

CPU times in seconds for Clement matrices 



Graded matrix 

This matrix C was created in AT form with T — tridiag(6, a, c), 

aj = b 3 ; = c 3 ■= 3- (j ' _1) , j = l,...,n-l; a„ = 3" ( ™- 1} , 

and A = diag(5), Sj = (— l)^^ 1 )/ 2 ] , j = 1, . . . , n. The result is a balanced matrix with eigenvalues 
of different magnitude. 

Figure 17721 shows the approximated eigenvalues for n = 100. Table 17731 reports the CPU times. 



n 


BGT 


complex dqds 


tridqds 


50 


0.31 


0.02 


0.02 


100 


0.67 


0.06 


0.04 


200 


2.12 


0.14 


0.06 


400 




0.45 


0.35 



Table 7.3 

CPU times in seconds for the graded matrices 



BGT code reported the message "Exceed maximum number of operations" for n = 400. The 
performance of all methods for the flipped matrix is practically the same. 
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x10 - 19 BGT (.) and tridqds (+) 

1 | 1 1 1 r- 



-3 - 



-0.5 0.5 1 1.5 



Figure 7.2. Eigenvalues of the graded matrix with n = 100 



Matrix with clusters 



Matrix Test 5 in pQ, 

C = J D~ 1 tridiag(l,a,l), D = diag(/3), a,/3eW l 
a k = W 5 ^ k -(-l)W 4 K & = (-l)L*/ 3 J, k=l,...,n, 

seems to be a challenging test matrix. It was designed to have large, tight clusters of eigenvalues 
around 10~ 5 , — 10 5 and 10 5 . Half the spectrum is around 10~ 5 and the rest is divided unevenly 
between — 10 5 and 10 5 . The diagonal alternates between entries of absolute value 10 5 and 10 -5 and 
so, for dqds codes, there is a lot of rearranging to do. When n > 100 it is not clear what is meant 
by accuracy. 

All three codes obtain the correct number of eigenvalues in each cluster and the diameters of 
the clusters are all about 10~ 5 . The striking feature is the time taken. See Table [7T4l 



n 


BGT 


complex dqds 


tridqds 


50 


1.2 


0.03 


0.01 


100 


4.5 


0.05 


0.03 


200 


20.1 


0.14 


0.08 


400 


85.0 


0.61 


0.13 



Table 7.4 
CPU times in seconds for Test 5 matrix 
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Matrix Test 4 

For matrix Test 4 in [I], 

C = D^ 1 tridiag(l, a, 1), D = diag(/3), a,(3e E" 
a k = (-l) k , /3 fe = 20-(-l)L fe /5J 5 k=l,...,n, 

the performance of the three codes is shown in Table 17.51 Figure 17.31 shows the eigenvalues of this 
matrix for n = 50. 



n 


BGT 


complex dqds 


tridqds 


50 


1.0 


0.05 


0.03 


100 


4.3 


0.08 


0.03 


200 


17.8 


0.27 


0.09 


400 


78.6 


1.0 


0.44 


800 


342.6 


5.5 


2.4 



Table 7.5 

CPU times in seconds for Test 4 matrix 



0.05 
0.04 
0.03 
0.02 
0.01 


-0.01 
-0.02 
-0.03 
-0.04 



-0.05 !— 
-0.1 



BGT (.) and tridqds (+) 



i 



-0.05 



0.05 



0.1 



Liu matrix 



Figure 7.3. Eigenvalues of Test 4 matrix with n = 50 



Z. A. Liu [13] devised an algorithm to obtain one-point spectrum unreduced tridiagonal matrices 
of arbitrary dimension nx n. These matrices have only one eigenvalue, zero with multiplicity n, and 
the Jordan form consists of one Jordan block. Our code tridqds computes this eigenvalue exactly 
(and also the generalized eigenvectors) using the following method which is part of the prologue. 
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The best place to start looking for eigenvalues of a tridiagonal matrix C = tridiag(a, b, c) is at 
the arithmetic mean which we know (fi — trace(C)/n). Before converting to J— form and factoring, 
we check whether /i is an eigenvalue by using the 3-term recurrence to solve 

n-1 

(nl - C)x = e n p n (fi)/ Y[ a- 

i=l 

Here 

x\ = 1, x 2 = (fi - a 2 )/ci, 
Xj+i = — [(m - aj)xj - bj-iXj-i] , j = 2, . . .,n - 1, 

and 

v := (fi - a n )x n - b n -ix n -i ^=p„(/i)/ Y[ ■ 

If, by chance, v vanishes, or is negligible compared to ||cc|j, then \i is an eigenvalue (to working 
accuracy) and x is an eigenvector. To check its multiplicity we differentiate with respect to [i and 
solve 

(fil - C)y = x 

with 2/1 = 0, y 2 = 1 = x' 2 (= x\). If 

n-1 
i=l 

vanishes, or is negligible w.r.t. \\y\\, then we continue the same way until the system is inconsistent 
or there are n generalized eigenvectors. 

Usually v ^ and the calculation appears to have been a waste. This is not quite correct. In 
exact arithmetic, triangular factorization of \il — C or \il — J, where J = ACA -1 , will break down 
if, and only if, Xj vanishes for 1 < j < n. So our code examines min, \xj | and if it is too small w.r.t. 
its neighbors and w.r.t. ||a;|| then we do not choose \x as our initial shift. Otherwise we do obtain 
initial L and U from J — fil = LU . 

8. Conclusions and future work. We conclude that, working together, a single dqds trans- 
form with real shifts and our tridqds transform with complex conjugate pairs of shifts constitute the 
right tool for computing the eigenvalues of real tridiagonal matrices. 

However there is far more work to be done for the following reasons. In a previous paper we 
discovered that, surprisingly often, eigenvalues are determined to, not high, but adequate relative 
accuracy; tiny relative changes 77 in the parameters that define the matrix produce relative changes 
in the eigenvalue of the order of 10 3 r/ or 10 4 r/. This is good news. We cannot tell in advance when 
this occurs. In our opinion a relative condition number should be returned with each eigenvalue. 
This requires an approximation to the row and column eigenvectors, whether or not the user needs 
them. 

We envision software that computes an initial approximation to each eigenvalue and then in- 
vokes a generalized Rayleigh quotient iteration to both compute eigenvectors and obtain a refined 
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eigenvalue approximation, along with the smallest residual norms that could be achieved. Then the 
relative condition number can be formed. See [5]. 

Another practical feature is to scan the initial matrix to extract Gersgorin disks and a tight box 
in the complex plane that contains the spectrum. Matrices from industrial sources frequently permit 
"localization" of the eigenvectors belonging to certain parts of the spectrum. One consequence is 
that the relevant eigenvectors, and eigenvalues, may be obtained from small submatrices. 

It is also important to scale and normalize the initial matrix and make use of the splitting that 
occurs with big matrices. Currently our shift strategy is quite straightforward and it is both difficult 
and worthwhile to improve it. There are plenty of challenges to be met before software for this 
real tridiagonal problem can be installed in packages such as LAPACK, not to mention parallel 
computation and SCALAPACK. 



Appendix A. Implementation details of minor step i. In this appendix we show how the 
calculations involved in each minor step of tridqds can be organized. 

For each minor step i, i — 2, . . . ,n — 3, consider F and G as in (|4.3[) . Denote the 2x1 bulge in 
F, indicated with plus signs, by [xi yi] T ■ And denote the entries Gr»+i,i and Gi+2,i> indicated 
with *,+,+, by \x r y r z r ] T . Subscripts I and r derive from "left" and "right", respectively. This 
way we have 



F 



1 

xi u i+ i 
VI 



1 

Ui+2 



G 



Ui-l 1 

Vr 1 

Z r tj_|_l 1 



and the minor step i can be accomplished using only these auxiliary variables. 



Minor step i 



a) • Matrices Z i 1 and Z% 



Ui Ui 



z; 



m 1 
1 



• The effect of Z i 1 and the effect of Zi 
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where 



1 

k-i 1 

xi u i+ i 

yi 



l 

Ui+2 



Ui-l 1 

Vr 1 



b) • Matrices C i 1 and Ci 



x r * U{ -\- y r 



C7 1 



1 

xi 1 

yi 



l 

-Xl 1 

-yi i 



where 



The effect of C7 1 



xi 

yi 



-Xl/k-l 

-yi/k-i 



c~ 1 f = 



l 

k-i i 

Xi U i+ i 1 



The effect of d 



yi u i+2 



Ui 1 

Vr k+1 1 

z r h+2 
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where 



Ui 
Vr 



c) • Matrices Y i 1 and Y{ 



rr 1 



where 



The effect of Yr 



where 



The effect of Y t 



where 



1 

x r 

Vr 



FYr 1 



Vi 



YiG 



X ip X I 

Ur-Xl 

z r -yi-xi* 



-yi * k 



-2 



Y, 



1 

-Vr 



Vr 



X r I lli 
Vr/Ui 
Z r I Hi 



l 

k 1 

Xl U i+2 

Vl 



1 

Ui+3 



Xl + Vr + X r * U i+ i 

- yi + z r + y r * u i+2 

- z r * u i+3 



1 

Vr 



1 

h+2 



Xf ^ 1 X >p 

Vr < k+1 - Vr 

Zip ^ Zip 
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Appendix B. tridqds algorithm. 



tridqds(a, a) : 

% step 1 

x r = 1; y r = h; z r = 
% the effect of Z 1 

% the matrix C^ 1 

xi = (m + h) 2 + u 2 h - 2(5Rcr)(ui + h) + \a\ 2 
Vi = -U2I1U3I2/X1 

xi = -u 2 h(ui + h + U2 + h - 2(lk(j))/xi 
% the effect of d 

x r = Vr - Xf, y r = z r - J/J - Xi * Z 2 ; z r = -y x * l 3 
% the matrix Yf 1 

x r = x r /u\\ y r — y r /iii; z r — z r /u\ 

% the effect of Yf 1 

l\ = xi + y r + x r * U2 

xi = yi + z r + y r * u 3 ; y% = z r * u 4 

% the effect of Yi 

X ^> — 1 Xrp ^ yv — ^2 Ut^ Zy — Zy 



for i = 2, . . . , n — 3 
% the effect of 
Xf — Xf -fc I yv 
% the matrix C~ x 
xi = -xi/k-i; yt = -yi/k-v, 
% the effect of & 

Hi X r X\ , 

x r =y r -x l ; y r = z r - y t - x t * ; z r = -y t * l i+2 
% the matrix Yr 1 

x r =x r /ui] y r . = y r /u i ; z r — z r /ui 

% the effect of Yf 1 

k = xi + y r + x r * u l+ i 

Xi =yi + z r + y r * u i+2 ; y t = z r * u l+3 

% the effect of Yi 

X r — 1 - Vr ^i+l Vri %r %r 

end for 
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% step n-2 

% the effect of Z„_ 2 

% the matrix £~1 2 

x\ = -xi/l n - 3 ; yi = -yi/l n -3; 

% the effect of £„_ 2 

u n -2 = x r - xr, 

X r = Vr ~ %l\ Vr = Z r — yi — Xi * l n —l 

% the matrix Y~\ 

X r =X r /u n - 2 ] yr=Vr/u n -2 

% the effect of Y~\ 

l n -2 = xi + y r + x r * U n -i 

Xl=Vl+Vr* u n 

% the effect of F„_ 2 

X r 1 X r , y r ln—1 yr 



% step n-1 
% the effect of Z n _i 
x r — x r * u n ~\ -r 
% the matrix C~\ 

Xl = -Xljln-2 

% the effect of £„_i 

U n —i — X r X/, 

x r y r - x\ 

% the matrix Y~\ 

X r - X r flL n —\ 

% the effect of Y~\ 

l-n—1 = %l H" * 

% the effect of Y n - 1 

Xy 1 Xy 



% step n 

% the effect of Z n 

Xf Xy *f* t-Liyi 

% the matrix L~ x = I 
% the effect of C n 

u n x r , 

% the matrix Y~ x = I 



C. Ferreira and B. Parlett 
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